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Abstract 

This paper describes the relationship between trading network and 
WWW network from preferential attachment mechanism perspective. This 
mechanism is known to be the underlying principle in the network evo- 
lution and has been incorporated to formulate two famous web pages 
ranking algorithms, PageRank and HITS. We point out the differences 
between trading network and WWW network in this mechanism, derive 
the formulation of HITS-based ranking algorithm for trading network as 
a direct consequence of the differences, and apply the same framework 
when deriving the formulation back to the HITS formulation that turns 
to become a technique to accelerate its convergences. 



1 Introduction 

The researches on the analysis of preferential attachment and network structur e 
can be dated back to 50's with the work of ISolomonoff and Rapoportl (|195ll ), 
where the authors presented the first systematic study of a class of networks 
known as random graphs. Actually the study of graph itself has a long history 
in mathematics as Euler introduced the using of vertices and edges to model 
the famous Konigsberg bridge problem in 1736. However, different from the 
classical studies, the modern network studies have some interesting additional 
features: (1) focusing on much larger problems that can contain million vertices 
so it is natural to consider statistical pro perties of the n e twork s, (2) dealing 
with real networks like Internet topology (IFaloutsos et all 1 19991 ), WWW net 
work ( Albert et all 1999 [ Broder et al. , 200o| ). metabolic networks ( Jeong et al, 



2000: 



Wagner and Fell. 



2001), scientific collaboration networks 



Price! , 



1965 



Newman! 120011 iBarabasi et al. , 20021: Jeong et al. . l2003l). and epidemic spread- 



ing networks dBall et all Il997t iKeeling. M.J.I .Il999t iKuperman and Abramsonl . 
2001 : Pastor-Satorras and Vespignani J200ll ) among others, and (3) studying dy- 
namical properties of the network s as many real networks are not static entities 
but grow according to so me rules ( Bianconi and Barabasi , 200ll : lBarabasi et"aT 
2002H Jeong et al.U2003l ). 
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The foundation of modern random graph theory which focuses on struc- 
ture and st atistical properties of very large ran dom graphs was set by Erdos 
and Renyf ( Erdos and Renvi , 19591 1960, 1961). The random graph is a very 



influential model in modeling the real networks because it can describe many 
phenomena including phase transition, short paths between most of vertex pairs, 
and the existence of a giant c omponent. Prior to the finding of scale- free network 
( Barabasi and Albertl . 19991 ). many network designs wer e based on the random 



graph model, including Internet data protocols design (iNewman et all 2006 ) 



19781 ) 



and s ocial network experiments setting (jTravers and Milgra m. 1969: IPool and Kochen 



The first widely known challenge to the random graph ca me from the study of 
WWW network topo l ogy b y Albert, Jeong, and BarabasQ (jAlbert et al. . 119991: 



Barabasi and Albert . 1999t ). While the first paper's main focus is in the short 



path b etween any pair of pages that supports the finding o f Watts and Strogata 



(|1998l ). the second paper is the first to explicitly challenge the effectiveness of 
the random graph in modeling the real networks by showing experimentally the 
ubiquitous existence of power-law degree distributions (pk oc /c~ 7 , where pk is 
the probability of vertices with degree k 7 and 7 is the exponent that usually in 
the range 2-3 in the real networks) in a variety of real networks including WWW 
network, scientific collaboration networks, and film actors networks that led to 
the famous scale-free hypothesis. 

The power-law degree distributions found in many real networks are con- 
sidered to be the most important remark that shows the discrepancy between 
random graph prediction of the degree distribution (which should follow Poisson 
distribution, pk oc m k e~~ m /k\, where m is the mean degree of vertices) and the 
real situation. This discrepancy outclasses other good predictions made by the 
random graph including short path, diameter of the graph, phase transition, 
and size of the giant component, and generates a very large number of scientific 
publications on such networks vary from mathematics, physics, computer sci- 
ence, economics to sociology. And in turn creates a new field of study: complex 
networks. 

There are several fundamental reasons behind the curiosity in the ubiquity 
of the scale-free phenomenon. We enlist some of them here: 

1. Different from the Poisson distribution, mean value and standard devia- 
tion of the power-law distribution doesn't imply centrality and data dis- 
persion. As we know, these metrics can be very useful in describing a 
distribution without having to plot it. But in the case of the power-law, 
these metrices can be misleading because the mean value doesn't reflect 
the centrality and the standard deviation doesn't tell us about the range 
where most of the data lies. 

2. There is no peak value in the power-law, pk decreases monotonically as k 
increases. 



x We must note here actually Price llPricd . Il965f) is the first person to show that real 
networks, scientific collaboration networks, are following power-law degree distribution, and 
his fi nding was long before the works of Barabasi et al. dBarabasi et al .. 2002; J eong et al.L 
2003). Interestingly, Price didn't seem to know about the famous random graph model of 
Erdos and Renyi, and also Barabasi et al. didn't know about the previous work of Price when 
studying scientific collaboration networks. 
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(a) Poisson distribution (b) Power-law distribution 

Figure 1: Poisson and power-law distribution plots for several m and A respec- 
tively. Note that x-axis is the degree k, y-axis is the probability p^, and (b) is 
in log-log scale. 



3. The power-law has a large tail that decays much slower than the Poisson 
distribution, thus there are some vertices with very high degree. These 
vertices are the hubs and have role to keep the integrity and robustness 
of the network. 

Of the three reasons above, the existence of hubs is considered to be the 
most important and surprising finding because: 

1. Prior to the works of Barabasi and Albertl ( 19991 ) . it was very natural 



to think that in general real datasets have Poisson distribution family 
(including binomial, normal, and Gaussian distribution). 

The existence of very large hubs implies that virtually there is no limit for 
vertices to create and receive new edges (this is the reason for scale-free 
term picked by Barabasi and Albert). 

There should be some fundamental principles that govern the evolution 
of the scale-free networ ks. The principles are des cribed as growth and 
preferential attachment ( Barabasi and Albert! 19991 ). where the probabil- 



ity of receiving a new edge is proportional to the number of edges a vertex 
already has. 

In this paper, we study the preferential attachment mechanism in trading 
networks. By using the supply and demand principle, we show that the prefer- 
ential attachment in trading networks is opposite to the corresponding mecha- 
nism in WWW network. Because the preferential attachment is the principle 
behind the formulation of link structure ranking algorithms like PageRank and 
HITS (see section [3] for details) , we will use the differences to define a ranking 
algorithm for trading networks. The proposed algorithm will be HITS-based 
because there arc two type of transactions to be captured, sellings and buyings. 
In network term, sellings arc equivalent to creating new outlinks and buyings 
arc equivalent to receiving new inlinks (see section H for details about resource 
flows). So out of these algorithms, only HITS which produces two type of scores, 
authority scores that correspond with inlinks (buyings) and hub scores that cor- 
respond with outlinks (sellings), can be extended to trading networks. And by 
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using the same framework when deriving the proposed algorithm for trading 
networks, we present a new approa ch to accelera te HITS computation. The 
preliminary results can be found in (|Mirzail2009allbh 



2 Preferential attachment 



The preferential attachment is a concept introduced by Yule in 1925 (|Yuld . ll925l) 
and then is used to describe a class of mechanisms in which the probability of 
receiving a quantity is proportional to the number of that quantity the object 
already has. It has appeared in several fields under di fferent name s; in informa- 
tion science it is know n as the cumul ative advantage ( Pricel 19761 ). in sociology 
as the Matthe w effect (jMertonl . Il968l) . and in economics as the Gibrat principle 
(|SimorJ . [l955h . The preferential attachment is long known to be the principle 
behind the power-law distribution exhibit ed by some r eal datasets, for example 
the distributions of wealth a ccumulation ( Reedl . 2001 ). the distribution of the 



number of species per genus (lYulel. 



used in books and documents ( Zipf 



9251) . the distributions of word frequencies 



1935 ; Konchadvl 2006), and the number of 



collaborators in scientific collaboration networks ( Pricel 1965 ; Newman . 2001 



Barabasi et al. . 20021 ; Jeong et al. , 2003) among others. 

However, the presence of the preferential attachment in the network evolu- 
tion doesn't always produce the power-law degree distribution. If there are some 
constraints in generating new edges, usually the degree distribution will not be 
following the power-law because there are not many vertices with very high 
number of degree. But usually it will not be following the Poisson distribution 
either. Instead it will have non power-law but still right-skewed degree distri- 
bution. For example, power grid and air traffic have exponential distributions, 
friendship networks have Gaussian distributions, a nd movie actor s netw ork has 
an exponentially t r uncate d power-law distribution ( Amaral et al. . 2000). 



Barabasi et al. provide a robust test to detect the preferential at 



tachment in the network evolution by observing the change of degree Afc as 
a function of fc (Afc oc fc") for every vertex over some time intervals (thus 
requiring dynamic data which is not always available). If the preferential at- 
tachment exists, v will be bigger than 0. For perfectly scale-free network, v will 
be equal to 1 . And if the m e chani sm doesn't exist, v will be equal to (note 
that actually [Barabasi et alj ( 2002 ) use integral of the probability of receiving 
new edges, n(k) — ^ T\{k')dk' where n(fc') oc k" to define the test). Different 
from usual simple test by only plotting pk , this method can distinguish networks 
with preferent ial attachment from m erely random graphs with power-law degree 
distributions ( Newman et al.l . l200ll) . However, on many occasions it is usually 
sufficient to utilize pk distribution, and actually this is the most common ap- 
proach used by the res earchers to detect the presence of pre f erenti a l attachment 
in the real networks (Faloutsos et al. . 19991 iBroder et all 2000 : Jeong et al 



elll. 12(H) ll Newman. l200lt IPastor -Satorras and V cspignani. 
"' 20011; iBarabasi and Albertl. Il999l: IPrice 7fl965l: iKleinberg 
Albert and Barabasil |2000| ; IPricel . 1 19761; iMertonl . [l968t Izipj Il935t) . 



2000; 



2001 



1999; 



Wagner and 



Lilier os et al 
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3 PageRank and HITS 



Around the same time with the finding of the preferential attachment in the 
network evolution, two groups of researchers started to realize the role of link 
structure of WWW network in determining the values of the pages. Links 
in WWW network are the hyperlinks created by the site owners to point to 
other relevant pages, favorite pages, popular pages, or pages that contain useful 
information (these were especially true in the beginning of WWW era where 
most hyperlinks were created by human and link spammers were rare). So, the 
hyperlinks reflect the opinion about the values of the pages; the more valuable 
the pages, the more inlinks they have. Thus, the hyperlink structure can be 
utilized to distinguish important pages from less important ones. 

The first link structure ranking alg orithm was proposed by Brin and Page 
()Brin et all [l999l : IB rin and Paget 1 1998). known as PageRank, a popularity mea- 



sure based on hypothesis of a random surfer that is infinitely following the hy- 
perlink structure of WWW network. In the long run, the proportion of time 
a random surfer spends on a page depends on the number of inlinks the page 
has and on the number of inlinks other pages that point to it have. This is 
intuitive because the number of inlinks of a page reflects its reachability from 
other pages. And because the proportion of time it spends on a page reflects the 
value of the page, the PageRank score of a page is proportional to the number 
of inlinks the page has and to the number of inlinks other pages that point to 
it have. On the other hand, because the hyperlinks are the opinions or the 
recommendations created by the site owners to other pages, the values of the 
recommendations should be dropped if there are too many of them on a page. 
Thus, the PageRank score of a page is inversely proportional to the number of 
outlinks other pages that point to it have. Mathematically, PageRank is defined 
with the following equation: 

*< - £ -5*- (i) 

where pr^ denotes PageRank score of page i, outdeg^ denotes outdegree of i, 
and Bi denotes set of pages that point to i. 

The above equation is a circular statement: the score of a page depends on 
the scores of other pages that point to it, and in turn the scores of those pages 
depend on the scores of other pages that point to them. To solve it, usually 
iterative procedure is employed with each page is given an initial value (usually 
set to 1/N, where N is the number of pages). 

(fe) 

( fc+1) = £^_ k = l,...,K (2) 

where K denotes the final iteration where the predefined criterion is satisfied. 
To get a more compact form, Equation ^ is rewritten in the matrix form. 

pr (fe+l)T = pr (fe)T Do l L (3) 

where Do = diag(outdeg 1 , . . . , outdegjy), pr( fe ) T denotes the 1 x N PageRank 
vector at iteration k, and L denotes the adjacency matrix induced from WWW 
network where [L]jj = 1 if there is a hyperlink from i to j, and otherwise. 
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Equation © is the proble m of finding th e domi nant eigenvector of (Do L 



by using the power method (| Barret et all [1994). By Markov chains theory 



Equation (O converges to a unique positive PageRank vector p r iff Do X L is 
stochastic, irreducible, and aperiodic ( Langville and Mever , 20061 ). 



A matrix is stochastic iff there is no zero row and all the rows are normalized. 
So, the first adjustment is to modify Do _1 L into a stochastic matrix. Let d be 
N x 1 dangling vector where its n th (n = 1,2,... N) entry is 1 if n is a dangling 
page and otherwise, and e T be all-one 1 x TV vector. The stochastic version 
of Do^L is S = Do 'L + (l/iV)de T . 

A matrix is irreducible iff its directed graph is strongly connected; for every 
pair of vertices, there is at least one path connecting them. And a matrix is 
aperiodic iff there is only one principal eigenvalue on the spectral circle. The 
irreducibility and aperiodicity properties can be enforced by replacing all zero 
entries of S with small positive numbers. Thus, the stochastic, irreducible, and 
aperiodic version of Do *L is P = aS + (1/A)(1 — a)ee T , where < a < 1 
denotes a scalar that controls proportion of time the random surfer follows the 
hyperlinks as opposed to teleporting (usually set to 0.85). And Equation © 
can be rewritten as: 

pr (fc+l)T _ pr (fc)Tp 

= apr^Do" 1 ! + (l/N)(apr^ T d + l-a)e T (4) 



The second ranking al gorithm, H I TS (H ypertext Induced Topic Search) was 
introduced by Kleinberg ( Kleinberg . 19991 ). Different from PageRank, HITS 



produces two metrics associated with every page, authority and hub. Authority 
scores determine pages' popularity and hub scores are used to find portal pages, 
pages that link to popular (thus useful) pages. 

HITS is defined with the following statement: authority score of a page is 
the sum of hub scores of others that point to it and hub sco re of a pag e is th e 
sum of authority scores of others that are pointed to by it (|Kleinberd .[T999). 
Like PageRank, this is also a circular statement, the authority scores depend 
on the hub scores and vice versa. To solve it, the following equation is used. 



= E and h i = E 4 +1) ( 5 ) 

(k) (k) 

where a\ and h\ denote the authority and hub score of page i at iteration k, 
Bi denotes the set of pages that point to i, and Ti denotes the set of pages that 
are pointed to by i. In the matrix from, HITS formulation can be rewritten as: 



a 



(k+l)T = h (fe)T Lj and h (fc+l)T = a (fe+l)T L T (6) 



where a T denotes lxN authority vector and h T denotes lxiV hub vector. 

In HITS both authority matrix, L T L ( a ( fc+1 ) T = aW T L T L) and hub ma- 
trix, LL T (h( fc+1 ) T = h( fe ' T LL T ) are nonnegative. Thus by Perron theorem 
for nonnegative matrices, a T and h T exist but there is no guarantee of the 
uniqueness. To ensure the unique ness, the authority and hub matrices must 
be m odified into positive matrices ( Farahat et all [20061 : iLangville and Meyer . 



20061 ). Let A and H be the positive version of the authority matrix and hub 
matrix respectively. We can define them as A = £L T L + (1/N)(1 — ()ee T , 
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Figure 2: Labeled-link network model of the trading activities. 



and H = (LL T + (l/iV)(l — £)ee T , where < £ < 1 denotes a constant 
that should be set near to 1 to preserve the hyperlink structure information. 
Thus, unique and positive authority and hub vectors can be calculated by using 

a (fe+l)T = a (k)T^ and h (fe+l)T = h (fc)Tfi_ 

4 Link structure ranking algorithm for trading 
networks 

The trading activities are the exchanges of different goods and/or services (we 
will refer goods/services as resources for the rest of the paper) involving at least 
two agents. These activities can be modeled with labeled-link network where 
the vertices are the agents and the directed edges are the flows of the resources. 
Figure [U shows the network model of the trading activities. Note that actually 
the transactions are mutual; there are two opposite flows for each transaction, 
the flow of the resource and the flow of the payment. However because the 
price is a better unit of account in the market and generally is used to measure 
the quantity of the resources, each transaction can be described by only one 
directed edge, the flow of the resource weighted with the price. 

There are some differences between trading network and WWW network 
that are worth to be noted. First, in trading network every vertex has at least 
one type of resource and a new edge is created when two vertices exchange their 
resources. Consequently, the amount of resources limits the number and the 
weight of edges a vertex can have. On the other hand, in WWW network the 
creation of edges is simply the creation of new hyperlinks on the web pages, 
so there is no resource needs to be allocated. Second, different from trading 
network, the creation of edges in WWW network is not a mutual process; if 
page A has a hyperlink to page B, it doesn't necessary that B also has a 
hyperlink to A. Third, every edge in trading network is labeled with resource 
description and is weighted with the price. On the other hand, the edges in 
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WWW network are usually unlabeled and weighted with either 1 or C0. And 
fourth, while the purpose of edges creation in trading network is to maximize 
the transaction benefits, in WWW network is to get hyperlinks from popular- 
pages. 

The last difference is directly related to the preferential attachment mecha- 
nisms. Before we define the preferential attachment in trading network, we will 
enlist some assumptions that have to be taken in order to simplify the complex 
interactions among agents. 

1. All transactions are carried out under ceteris paribus condition. So the 
prices depend only on demands and supplies of the corresponding re- 
sources, not other substitute or complementary resources. 

2. The perfect market condition is met and the prices have already reached 
the equilibrium states. 

3. The amount of resources owned by an agent is reflected in its buying and 
selling volumes of the corresponding resources. 

The first assumption allows us to form and analyze one network for each 
resource independently. The second assumption guarantees that resources avail- 
ability is the main motivation in choosing business partners, not the price differ- 
ences. And the third assumption allows us to estimate the resources availability 
for future transactions by using current and past buying and selling volumes of 
the corresponding resources, which is reflected in the weights of the inlinks and 
outlinks. 

Note that both first and second assumptions are very common in the trading 
network analysis and the economics in general. So, we will only discuss the 
reasons behind the last assumption. The last assumption is the heart of the 
proposed algorithm formulation because it allows us to (1) model the trading 
activities completely with the labeled-link network which is a standard model 
in graph theory, (2) relate the amount of resources owned by an agent to the 
weights of the corresponding inlinks and outlinks, and (3) define the preferential 
attachment in trading network by using the number of inlinks and outlinks 
(more specifically, total weights of those links) so that it can be compared to the 
preferential attachment in WWW network induced from the HITS formulation 
(see Figure [3]) , and in turn allowing us to formulate a ranking algorithm for 
trading network. 

4.1 Proposed algorithm formulation 

In trading activities, there are costs associated with every transaction. Thus, 
every agent must implement an optimal preferential attachment strategy to 
maximize the benefits. In the real situation, every transaction conducted by 
an agent influences its financial states, including transactions from different 
resources. However, by assumption 1 we can isolate the influences and form 

2 There are some work s that are devoted to the analysis of WWW n etwor k labels (th e 
hypertexts). For example: lEiron and McCurlevI l|2003h . iKoIda et al.1 j2005ft . and lFuiiil [|2008h . 
But because the hyperlinks are the recommendations, they are alike, and in some cases can 
be ignored safely, including the calculations of PageRank and HITS. Conversely, in trading 
network the labels are the inherent information of the transactions that cannot be ignored at 
any cost 
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Figure 3: The preferential attachment mechanisms in trading network and 
WWW network (HITS's version). 



one labeled-link network for each different resource. So, if there are x type of 
resources traded among the agents, there will be x labeled-link networks that 
can be analyzed separately. Then by assumption 2, each agent should buy 
(receive inlinks) from others with abundant resources, and should sell (create 
outlinks) to others that are lack of the resources. And by assumption 3, agents 
with abundant resources are the agents with many inlinks and agents that lack 
of the resources are the agents with many outlinks. 

Thus, we can define the preferential attachment in trading network with 
the following statement: an agent should receive new inlinks from others with 
many inlinks and should create new outlinks to others with many outlinks. This 
statement is interesting because it resembles HITS's version of preferential at- 
tachment in WWW network. As discussed in section [21 in HITS good author- 
ities (pages with many inlinks) are pointed to by good hubs (pages with many 
outlinks) and good hubs point to good authorities. Thus, HITS's version of pref- 
erential attachment is: a page should receive new inlinks from others with many 



outlinks and should create new outlinks to others with many inlinks. Figure 3(a) 
and |3(b)| show the preferential attachment in trading network and WWW net- 
work respectively. As we can see, the preferential attachment in trading network 
is opposite to the HITS's version of preferential attachment in WWW network, 
so it can be utilized to formulate the proposed algorithm. 

The proposed algorithm is defined with the following statement: a vertex 
becomes more important if being pointed to by others with many inlinks and 
points to others with many outlinks. This statement is derived directly from the 
preferential attachment in trading network defined above. And by comparing 
the preferential attachments in both networks (see Figure [3]) and the HITS 
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formulation (see Equation [5]) , the proposed algorithm can be written as: 



r (fc+i) = p J2 r^caj + (1 - (3) T T ch ^ where ( 7 ) 
cai = ^ii| mdeg . _ outdegj* , (8) 

chi — ou ^ e St mc i e g _ outdegj I ~ Pi , and (9) 
deg, 

!1 if indegj > outdegj 
— 1 if indegj < outdegj (10) 
otherwise 

(k) 

where r\ denotes ranking score of vertex i at iteration fc; indegj, outdegj, and 
degj denote indegree, outdegree, and degree of i; and < (3 < 1 is a scalar used 
to determine which link is more important. If outlink (selling) is more important 
than inlink (buying), (3 < 0.5; if inlink is more important than outlink, (3 > 0.5; 
and (3 = 0.5 otherwise. 

The constants ca and ch are introduced to favour the preferential attach- 
ment. As shown in Equation [8] and [9j ca will be bigger for vertices with many 
inlinks, and ch will be bigger for vertices with many outlinks. Thus, by Equa- 
tion vertices that are pointed to by others with many inlinks and point to 
others with many outlinks (following the preferential attachment) will have big- 
ger scores than vertices that do the opposite (not following the preferential 
attachment). 

Note that the first term of the right hand part of Equation^ X^gb r< f^ ca j^ 
describes the fraction of scores a vertex receives from its inlinks, and the second 
term of the right hand part, YljeJ 7 - r< f c ^i; describes the fraction of scores 
a vertex receives from its outlinks. So, the first term can be defined as the 
authority part and the second term as the hub part. 

The proposed algorithm will be represented in matrix to allow necessary 
adjustments be applied in order to ensure the convergence. Let M = (3F + (1 — 
/3)G, where F = KD^DiL be the authority part, and G = K^D^DoL 1 " 
be the hub part. Then, Equation [7] can be rewritten as: 

r (fe+l)T = r (k)T M 

= rW T ^KD- x DiL + (1 - ^K^D^DoL 7 ) (11) 

where L denotes the induced adjacency matrix, r^ T denotes 1 x ranking vec- 
tor at interation fc, Di = diag(indeg 1; . . . , indegj), Do = diag(outdeg 1; . . . , outdegj), 
D = Di + Do, and K is a diagonal matrix where [K]jj = | (Di — Do)jj| Pi . Note 
that different from WWW network, in trading network entries of L are the 
weights of the corresponding links which are usually nonnegative real numbers. 

As shown in Equation [Til M has no zero row, but is not a stochastic matrix 
because the rows are not normalized. Therefore the stochasticity adjustment is 
required. Let N be a diagonal matrix where [N]jj = Yljev denotes the 

set of all vertices in the network), the stochastic version of M can be written 
as M = N _1 M. And the irreducibility and aperiodicity adjustments can be 
done by replacing all zero entries of M with small positive numbers: R = 
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CM + (1/N)(1 — C)ee T , where < ( < 1 is equivalent to a in PageRank and 
should be set near to 1. Thus, the proposed algorithm can be rewritten as: 



r 



(k+l)T = r (fc)T R (12) 



As we can see, R is identical to P in Equation [H and by choosing a positive 
initial vector (for example r^ k=1 ^ T = (l/A)e T ) the E quati on fT2l is g uarant eed 
to converge to a unique positive ranking vector r^ K ^ T ( Farahat et al. . hood ). 

The proposed algorithm only accommodates the flowing resources. If we 
have data about the amount of resources owned by the agents which is not 
from the transactions, for example natural resources like gas, oil, coal, gold, 
etc (we will refer these as reserved resources for the rest of the paper), this 
information can also be included in the final scores. Let u T be 1 x N vector 
where [u]i corresponds to the amount of the reserved resource of agent i. Then 
the final ranking vector can be written as: f T = c r T + (1 — c)u T , where u is the 
normalized version of u, and < c < 1 is a control parameter that determines 
which vector is more important. 



We can also introduce a scaling constant similar to the work of Bianconi and Barabasil 



(|200ll ) associated with every agent to the final score to describe its competitive- 
ness. These constants can be used not only to favour the competitive agents, 
but also to handle some issues related to the trading activities like reliability 
and trust issues. 

Occasionally, agents' scores as the buyers and/or the sellers are more de- 



sirable than the overall scores. By inspecting Figure 3(a) and Equation (0, 
ranking vector as the buyers, b T , and as the sellers, s T , can be written as: 

b (fc+l)T = b WT CaL) and S (fe+1)T = s «T GhL T (13) 

where Ca = diag(cai, . . . , cajv), and Ch = diag(c/ii, . . . , c/ijv). 
4.2 Experimental results 

We will examine the proposed algorithm performance by using inter national 
trading datasets from the United Nations ([United Nationsl[l996lll999l ). There 



are several good reasons in choosing these datasets. First, the size of the net- 
works are small compared to other datasets like online auction networks, there- 
fore the errors produced in each iteration can be minimized. Second, the classi- 
fication of products is clear, so the adjacency matrix for every product can be 
easily constructed. And third, the prices of the products in the same category 
are almost the same, complying with the second assumption. 

As stated earlier R is stochastic, irreducible, and aperiodic. Thus, the power 
method applied to Equation (|12p is guaranteed to converge to a unique positive 
ranking vector r^ K ' T for any positive starting vector. Therefore, the question 
left is "will it converge to something that makes sense in the context of measuring 
the degree of importance of agents in trading network". We will answer this 
question by calculating the similarity between vector of our proposed algorithm 
r, and standard measure, vector of total export and import t. This vector is 
chosen as the standard measure not only because it is the simplest and common 
way in measuring the degree of importance, but also because the most active 
agents are usually the most connected ones which are conventionally considered 
to be the most important vertices in the graph theory. And as the similarity 
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measures, cosine criterion cos 9 and Spearman rank order correlation coefficient 
p will be used. 



cosd = w and 9 = 1 — (14) 

where || * ||2 denotes 2-norm of vector *, and o(*) denotes the ordering induced 
from vector *. For example, if * = [0.3397,0.1819,0.3328], then o(*) = [1,3,2]. 
Thus, while the cosine criterion measures the distance between two vectors, the 
Spearman correlation measures the similarity between orderings induced from 
the vectors. 

To get insight about the computational performance, the number of itera- 
tions required by the proposed algorithm to achieve the same residual level will 
be compared to the results of PageRank and HITS. In the experiments, the 
residual level is set to 10~ 8 and j3 is set to 0.5. The number of iterations is 
chosen instead of computational time because the sizes of trading networks are 
very small, so the power method produces negligible computational time. Table 
Q] gives summary of the results, and Table [2] and [3] show lists of top ten coun- 
tries in hydrogen peroxide trading (the least similar to the standard measure in 
the cosine criterion) and medicinal products (the most similar to the standard 
measure in the cosine criterion). 

As shown in Table [TJ the proposed algorithm takes more iteration steps to 
converge. But because trading networks are usually much smaller than WWW 
network, this is unlikely to become a problem (the computational times of these 
three algorithms are practically zero). And the similarity measures both in the 
cosine criterion and the Spearman correlation give promising results with aver- 
age around 89% and 91% respectively. This high similarities are also confirmed 
by the top ten countries shown in Table [21 and [3] Thus, it can be conferred 
that the proposed algorithm gives meaningful results in measuring the degree 
of importance of vertices in trading networks. 

However, an important issue arises concerning the usefulness of the proposed 
algorithm. If the total volumes can describe the degree of importance, one 
can argue about the meaning of using the proposed algorithm which is clearly 
much more expensive to compute. Before answering this question, we should 
make clear that in general the problem of assigning the degree of importance to 
vertices in a graph doesn't have correct solution. Rather, the "correct" issue is 



Table 1: The performances of the proposed algorithm. 



Data 


#Vcrt. 


#Edg. 


^Iterations 


Similarity 


HITS 


PR 


Prop. Alg. 


cos 6 


P 


Steel products 


97 


2627 


26 


54 


42 


0.862 


0.874 


Ethylene 


43 


169 


7 


44 


54 


0.849 


0.916 


Propylene 


38 


144 


10 


40 


143 


0.974 


0.905 


Sodium 


49 


268 


11 


53 


143 


0.808 


0.850 


Hydrogen peroxide 


47 


261 


51 


61 


99 


0.752 


0.902 


Carbon 


51 


535 


22 


37 


65 


0.912 


0.929 


Radio-active 


53 


717 


25 


23 


26 


0.884 


0.927 


Plastics 


53 


1410 


20 


37 


39 


0.985 


0.968 


Medicinal products 


53 


1504 


9 


18 


14 


0.989 


0.965 


Average 


54 


848 


20 


41 


69 


0.891 


0.915 
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Table 2: Top ten countries in hydrogen peroxide trading. 



Ordered by stand, meas. 


Ordered by prop. alg. 


Country 


Score 


Country 


Score 


Netherlands 


0.132290 


Japan 


0.172970 


Canada 


0.095014 


Norway 


0.123360 


United States 


0.088694 


Netherlands 


0.114200 


Moldova 


0.065088 


Canada 


0.082261 


Austria 


0.059850 


Turkey 


0.053170 


China 


0.054194 


United States 


0.047059 


Japan 


0.048676 


Rep. Korea 


0.043684 


Italy 


0.045744 


Moldova 


0.038344 


Colombia 


0.037772 


China 


0.036916 


Turkey 


0.037353 


Thailand 


0.034545 



Table 3: Top ten countries in medicinal products trading. 



Ordered by stand, meas. 


Ordered by prop. alg. 


Country 


Score 


Country 


Score 


Germany 


0.133530 


Germany 


0.139490 


United States 


0.114520 


United Kingdom 


0.107270 


United Kingdom 


0.096001 


United States 


0.098509 


France 


0.092408 


Switzerland 


0.095938 


Switzerland 


0.083244 


France 


0.085463 


Italy 


0.067707 


Italy 


0.064711 


Belg-Luxemb. 


0.056696 


Belg-Luxemb. 


0.051169 


Netherlands 


0.051564 


Netherlands 


0.047270 


Japan 


0.049308 


Ireland 


0.043663 


Sweden 


0.033573 


Sweden 


0.041134 



how to find the useful solution. This issue has been extensively studied in WWW 
network where there are numerous methods which can roughly be classified into 
query-dependent scores and query-independent scores. For example content 
scores are query- dependent and PageRank is query- independent. And if the 
user satisfaction is considered to be the usefulness standard, PageRank seems 
to be more useful than HITS. 

Hence, the main purpose of the proposed algorithm is to present a new 
method to compute ranking scores in trading networks which will become cru- 
cial if the problem involving finding the most important and relevant users in a 
large trading network like online auction network (this is the recommendation 
problem which a rises as one of the most important problem in the computer sci- 



ence researches ( Pan et aU 120061 )). And because the proposed algorithm uses 



the network structure, an uncaptured information in the total volumes method, 
the amount of the resource is not only the factor, the link structure informa- 
tion is also important in determining the final scores. Thus in the proposed 
algorithm's viewpoint, a well connected vertex which can be considered an im- 
portant vertex in the graph term are more favourable than a less connected 
vertex with the same amount of resource. 



13 



5 Acceleration method for HITS 



As shown in Figure [3] the preferential attachment in WWW network induced 
from the HITS definition is opposite to the preferential attachment in trading 
network. Therefore, the same framework when deriving the ranking algorithm 
for trading network can be applied back to HITS. To derive the modified HITS 
formulation, we first discuss Equation (|13|) because it separates the ranking vec- 
tor r into buying vector b and selling vector s, so it is in the same shape with the 
HITS formulation in Equation ©. By comparing the preferential attachment 
in trading network in Figure 3(a) with Equation (|13[) . we can get insight about 
the relationship between the preferential attachment and the buying and selling 
vectors. 



5.1 Modified HITS formulation 



As shown in the left hand side of Figure 3(a) an agent prefers other with many 



inlinks when receiving a new inlink. And in the first part of Equation (|13[) . 
ranking score of an agent as a buyer is the sum of ranking scores of others as 
buyers weighted with ca of the corresponding agents from which it receives the 
resources. Because ca is bigger if an agent has many inlinks than outlinks, the 
first part of Equation () 1 3(1 says an agent should receive new inlinks from others 
with many inlinks, which is identical to the preferential attachment shown in 



the left hand side of Figure 3(a) 



This is also true for the selling part (right hand side of Figure 3(a) I; an 
agent prefers other with many outlinks when creating a new outlink. And in 
the second part of Equation (|13| . ranking score of an agent as a seller is the 
sum of ranking scores of others as sellers weighted with ch of the corresponding 
agents to which it delivers the resources. Because ch is bigger if an agent 
has many outlinks, second part of Equation (fT3"|) says an agent should creates 
new outlinks to others with many outlinks, which is identical to the preferential 



attachment shown in the right hand side of Figure 3(a) Thus, it is clear that the 
preferential attachments in Figure [3] can be utilized directly to the formulation 
of the ranking algorithms. 

We will use this connection to define the modified HITS. As shown in the left 
hand side of Figure [3 (b)[ a page prefers other with many outlinks when receving 
a new inlink. Because in WWW network inlink corresponds with authority 
concept and outlink corresponds with hub concept, this preferential attachment 
implies that authority score of a page is the sum of hub scores of others that 
point to it weighted with ch of the corresponding pages. And in the right hand 
side of Figure [3 (b)[ a page prefers other with many inlinks when creating a new 
outlink. Consequently, hub score of a page is the sum of authority scores of 
others that are pointed to by it weighted with ca of the corresponding pages. 
Thus, the proposed algorithm can be written as: 

a ( fc+ D = £ h ip cK and h f + x) = £ a ( fe+ D ca . (15) 
jeBi jeFi 

And in the matrix form: 

a (fc+l)T = h (fc)T GhL; and h (fc+l)T = a (fc+DT CaL T (16) 
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Pages ordering 

Figure 4: The distances between initial and final distributions. 



As shown in the Equation (|T5|) , the proposed algorithm is HITS with the 
introduction of two constants to every page. Because ca is bigger for an au- 
thoritative page and ch is bigger for a hubby page, the pages that follow the 
preferential attachment will collect their scores faster as the iterations progress 
under the proposed algorithm than under HITS. Thus, it can be expected that 
the proposed algorithm will converge faster in the datasets t hat are following the 



preferential attachment. As shown in th e previous works (jBroder et al. . 120001: 



Albert et al. . 19991 Kleinberg et al. . 1999h . WWW network does have power-law 



degree distributions for both inlinks and outlinks, so the preferential attachment 
exists. 



Usually, a uniform distribution is used as the starting vector (|Langville and Meyer , 



20061 ) . Thus, the distances between initial and final scores are not uniform. For 



some very authoritative and hubby pages, it takes more iterations to reach the 
final scores. This is also true for pages that have very low final authority or 
hub scores. Figure [4] describes such condition; the distances between initial and 
final scores of the pages that ordered in the top and bottom are greater than the 
pages in the middle positions. Because the authority and hub scores are pro- 
portional to ca and c/H, the distances between final and initial authority and 
hub scores are proportional to ca and ch respectively. Thus, the pages ordered 
in the top (bottom) will reach the stationary values faster under the proposed 
algorithm due to the bigger (smaller) ca and ch. 



5.2 Experimental results 



Due to the limited space, we only present the exp erimental results and analysi s 
briefly. More detailed discussions can be found in ( Mirzal and Furukawa . 2010h . 

There are six datasets used in the experiments that consist of around 10 
thousands to 225 thousan ds pages with average degree from 4 to 47. Except 
wikipedia ( Segaranl 2006h . all datasets were crawled by using our crawling sys- 



3 As shown in jDing et all |2002| . 12004) , authority (hub) scores are proportional to the 
number of inlinks (outlinks), and by definition ca (ch) values are proportional to the number 
of inlinks (outlinks). Thus the authority and hub scores are proportional to the ca and ch 
respectively. 
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Table 4: Datasets summary. 



Data 


Crawled 


#Pages 


#Links 


AD 


britannica 


09/2008 


21104 


994554 


47.1 


jobs 


12/2008 


16056 


187957 


11.7 


opera 


12/2008 


49749 


437748 


8.8 


scholarpcdia 


06/2008 


74243 


1077781 


14.5 


Stanford 


12/2008 


225441 


2196441 


9.7 


wikipedia 


09/2006 


10431 


46152 


4.4 


Avera 


gc 


66170 


46152 
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Table 5: Top 10 results with query "programming" for wikipedia dataset. 



No. 


HITS 


Prop. Alg. 


1 


Programming. language 


Programming. language 


2 


Categorical_list_of _ 


Categorical_list_of _ 




programming_languages 


programming_languages 


3 


C_programming_language 


C_programming_language 


4 


Functional_programming 


Functional_programming 


5 


Obj ect-oriented_programming 


Object-oriented_programming 


6 


Programming_paradigm 


Java_programming_language 


7 


Java_programming_language 


Programming_paradigm 


8 


Generic_programming 


Generic_programming 


9 


Lisp_programming_language 


Lisp_programming_language 


10 


Ada_programming_language 


Ada_programming_language 



tern (jMirzall . l2009bh . All datasets, but britannica, have a typical WWW net 



work average degree, around 4 to 15 ( Langville and Meyer . 20061 Kamvar et al 



20031 ). Table [U summarizes the datasets where AD denotes the average degree. 

The experiments are conducted by using a notebook with 1.86 GHz Intel 
Processor and 2 GB RAM. The codes are written in python by extensively using 
database to store lists of adjacency matrices, score vectors, and other related 
data. Figure \E\ shows the convergence rates and Figure [H] shows processing times 
to achieve the same corresponding residual levels. Note that the uniform starting 
vectors are used for all datasets, and ca and ch computations have already been 
included in the processing times. 

As shown in Figure \5\ and O the proposed algorithm in general can give 
improvements to both convergence rates and processing times. While in the 
processing times there are still some cases where the proposed algorithm cannot 
do better than HITS, in the convergence rates the proposed algorithm performs 
better than HITS in all cases. Table [5] gives examples of top ten pages returned 
by HITS and the proposed algorithm with query "programming" for wikipedia 
dataset. Note that for brevity only file names are displayed. To get full URLs, 
each name has to be prefixed with 1 http : / / en.wikipedia.org/wiki /[ ' . 

6 Conclusion 

We present a link structure ranking algorithm for trading network which is de- 
rived from analyzing the preferential attachment mechanism in the network. We 
show that the mechanism in trading network is opposite to the corresponding 
mechanism in WWW network induced from the HITS definition. The diffcr- 
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(e) http://www.stanford.edu 



(f) http://www.wikipedia.org 



Figure 5: Convergence rates comparison. Note that x-axis is the number of 
iterations and y-axis is the residual in log scale. 



ences come from the fact that in trading network the links are the flows of the 
resources driven by the supply and demand principle, a fundamental principle 
behind the trading activities, and in WWW network the links are the hyperlinks 
that can be created without exchanging any resource. Because the preferential 
attachment is the underlying principle behind the HITS formulation, by utiliz- 
ing the differences we are able to define a link structure ranking algorithm for 
trading networks. The distinct feature of our algorithm is the using of network 
structure in determining the ranking scores which is a popular method in the 
WWW network researches. 

There are some possible applications of the proposed algorithm. The most 
obvious one is to use it as a metric to determine the degree of importance of 
agents involved in the trading activities. Different from the standard method of 
using aggregate transaction volumes, the proposed algorithm which makes use 
of the network structure will favour agents that are highly connected or link to 
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Britannica Jobs Opera Scholarpedia Stanford Wikipedia 



Figure 6: Processing times (in second) to reach the same corresponding residual 
levels. 



(are linked by) other highly connected countries. Thus, the network structure 
which is an invaluable information in the graph theory but uncaptured in the 
standard method will become an essential factor in determining the degree of 
importance. The second possible application is to design a recommendation 
scheme; for example in online auction network where the number of users is 
enormous, the proposed algorithm will be helpful in focusing efforts to only the 
most important users that are relevant to the search queries. 

In the WWW network part, we show that the modified HITS which favours 
the preferential attachment in general has better convergence rates than the 
original HITS, thus it can be used to improve the HITS computations. This is 
an interesting subjec t on its own and has been s tudied in our other work. The 
readers can refer to ( Mirzal and Purukawa . 2010h for detailed discussions. 
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